Passenger data from the Titanic¶
The dataset contains information about the passengers of the RMS Titanic, which sank on April 15, 1912, after colliding with an iceberg. The data includes attributes such as travel class, age, gender, number of siblings/spouses aboard, number of parents/children aboard, ticket price, and embarkation point. The dataset also includes information on whether the passenger survived the disaster. The Titanic carried over 2,200 people, of which over 1,500 perished, making this disaster one of the most tragic in maritime history. Columns:
- pclass - Ticket class
- survived - Whether the passenger survived the disaster
- name - Passenger's name
- sex - Passenger's gender
- age - Passenger's age
- sibsp - Number of siblings/spouses aboard
- parch - Number of parents/children aboard
- ticket - Ticket number
- fare - Ticket price
- cabin - Cabin number
- embarked - Port of embarkation (C = Cherbourg, Q = Queenstown, S = Southampton)
- boat - Lifeboat number
- body - Body number (if the passenger did not survive and the body was recovered)
- home.dest - Destination
Titanic Disaster in Numbers - Mateusz Nowakowski¶
Explanatory Data Analysis¶
1. General Data Overview¶
The subject of the analysis is the Titanic disaster. There were a total of 2,212 people on the Titanic: 1,320 passengers and 892 crew members. We have almost a complete list of all passengers. It should be noted that the analysis concerns only passengers as we do not have data on the crew. We have a dataset consisting of 1,310 rows and 14 columns, compiled - as we can guess - after the disaster. Seven columns contain numbers and the other seven contain strings. At first glance, we can see that the dataset has quite a few gaps and the data will require processing.
Source: https://en.wikipedia.org/wiki/Sinking_of_the_Titanic
<class 'pandas.core.frame.DataFrame'> RangeIndex: 1310 entries, 0 to 1309 Data columns (total 14 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 pclass 1309 non-null float64 1 survived 1309 non-null float64 2 name 1309 non-null object 3 sex 1309 non-null object 4 age 1046 non-null float64 5 sibsp 1309 non-null float64 6 parch 1309 non-null float64 7 ticket 1309 non-null object 8 fare 1308 non-null float64 9 cabin 295 non-null object 10 embarked 1307 non-null object 11 boat 486 non-null object 12 body 121 non-null float64 13 home.dest 745 non-null object dtypes: float64(7), object(7) memory usage: 143.4+ KB
2. Analysis of Missing Values¶
The dataset has many missing values. We will count and discuss them below.
| pclass | survived | name | sex | age | sibsp | parch | ticket | fare | cabin | embarked | boat | body | home.dest | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1309 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
0
| pclass | survived | name | sex | age | sibsp | parch | ticket | fare | cabin | embarked | boat | body | home.dest | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1308 | 3.0 | 0.0 | Zimmerman, Mr. Leo | male | 29.0 | 0.0 | 0.0 | 315082 | 7.875 | NaN | S | NaN | NaN | NaN |
Below we create a separate dataframe that contains the sum of all missing values for each column.¶
Let's discuss each column:
- columns 'pclass', 'survived', 'name', 'sex', 'sibsp', 'parch' and 'ticket' contain no missing values. This is very good news as these are key data for analysis
- in columns fare and embarked we see only single missing values - we will replace fare column with median while embarked column can be removed as it won't be needed for further analysis.
- columns 'cabin' and 'home destination' have many missing values but these are rather peripheral data so we will remove these two columns as they won't be needed for further analysis.
- column 'body' i.e. body number contains the most missing values at 90%. The very fact that there is no body number tells us that the body was most likely not found. In this case, therefore, lack of number is valuable information. We will not interfere with this data.
- in column 'boat' 823 records are missing. These are probably people who for various reasons did not make it onto a lifeboat. So we won't fill in missing data. The problem with this column, however, is that the same people have been assigned to more than one boat. For this reason, the column will require transformation.
- I left the 'age' column for last and here is a bit of a problem because this is very important information and we are missing 263 records or 20%. I see two options here: 1. any analysis of correlation with age will be conducted on a reduced sample. 2. we replace with mean or median and then we have a full sample and any age correlation studies will be somewhat distorted but will apply to all passengers. Option 2 seems better.
| Â | Missing Values | Percentage |
|---|---|---|
| pclass | 0 | 0.000000 |
| survived | 0 | 0.000000 |
| name | 0 | 0.000000 |
| sex | 0 | 0.000000 |
| age | 263 | 20.091673 |
| sibsp | 0 | 0.000000 |
| parch | 0 | 0.000000 |
| ticket | 0 | 0.000000 |
| fare | 1 | 0.076394 |
| cabin | 1014 | 77.463713 |
| embarked | 2 | 0.152788 |
| boat | 823 | 62.872422 |
| body | 1188 | 90.756303 |
| home.dest | 564 | 43.086325 |
Data Transformation¶
Let me transform the data before we move on to analyzing individual variables. The data requires significant processing. I would like to change a few things before we start analysis and draw any conclusions.
Let's list all the changes:
- we remove columns 'ticket', 'embarked', 'cabin' and 'home.dest' because they won't be needed for what I want to show you.
- columns 'sibsp'(siblings/spouse) and 'parch'(parents/children) are combined into one column and transformed so that we only know if someone traveled with family (1.0) or alone (0.0) - that should be enough
- we fix the data in the 'boat' column so that one passenger is assigned to only one boat.
- missing values in the 'fare' column are replaced with median
- missing values in the 'age' column are replaced with median
| pclass | survived | name | sex | age | sibsp | parch | fare | boat | body | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.0 | 1.0 | Allen, Miss. Elisabeth Walton | female | 29.0 | 0.0 | 0.0 | 211.3375 | 2 | NaN |
| name | survived | sex | age | pclass | with family | fare | boat | body | |
|---|---|---|---|---|---|---|---|---|---|
| 902 | Johnston, Mr. Andrew G | 0.0 | male | NaN | 3.0 | 1.0 | 23.4500 | NaN | NaN |
| 636 | Arnold-Franchi, Mrs. Josef (Josefine Franchi) | 0.0 | female | 18.0 | 3.0 | 1.0 | 17.8000 | NaN | NaN |
| 463 | Jefferys, Mr. Ernest Wilfred | 0.0 | male | 22.0 | 2.0 | 1.0 | 31.5000 | NaN | NaN |
| 346 | Botsford, Mr. William Hull | 0.0 | male | 26.0 | 2.0 | 0.0 | 13.0000 | NaN | NaN |
| 143 | Harder, Mr. George Achilles | 1.0 | male | 25.0 | 1.0 | 1.0 | 55.4417 | 5 | NaN |
boat 13 39 C 38 15 37 14 33 4 31 10 29 5 27 3 26 9 25 11 25 16 23 8 23 7 23 D 20 6 20 12 19 2 13 A 11 B 9 1 5 5 7 2 C D 2 13 15 2 5 9 1 8 10 1 13 15 B 1 15 16 1 Name: count, dtype: int64
boat 13 42 C 40 15 38 14 33 4 31 5 30 10 29 3 26 9 25 11 25 8 24 16 23 7 23 D 20 6 20 12 19 2 13 A 11 B 9 1 5 Name: count, dtype: int64
| name | survived | sex | age | pclass | with family | fare | boat | body | |
|---|---|---|---|---|---|---|---|---|---|
| 334 | Banfield, Mr. Frederick James | 0.0 | male | 28.0 | 2.0 | 0.0 | 10.5000 | NaN | NaN |
| 1078 | O'Dwyer, Miss. Ellen "Nellie" | 1.0 | female | 28.0 | 3.0 | 0.0 | 7.8792 | NaN | NaN |
| 825 | Goodwin, Master. Harold Victor | 0.0 | male | 9.0 | 3.0 | 1.0 | 46.9000 | NaN | NaN |
| 1296 | Wirz, Mr. Albert | 0.0 | male | 27.0 | 3.0 | 0.0 | 8.6625 | NaN | 131.0 |
| 296 | Thayer, Mrs. John Borland (Marian Longstreth M... | 1.0 | female | 39.0 | 1.0 | 1.0 | 110.8833 | 4 | NaN |
name 0 survived 0 sex 0 age 0 pclass 0 with family 0 fare 0 boat 823 body 1188 dtype: int64
4. Single Variable Analysis¶
Now that we have properly processed our dataframe, we will answer basic questions about individual variables:
- 'survived' - How many people survived the disaster?
- 'sex' - How many women and men were there?
- 'age' - Who were the youngest and oldest passengers and what was the average age?
- 'pclass' - How many people traveled in each class?
- 'with family' - How many people traveled alone versus with family?
- 'fare' - What was the cheapest and most expensive ticket, and what was the average and median ticket price?
- 'boat' - How many lifeboats were there and how many passengers were assigned to each lifeboat?
- 'body' - How many passengers who did not survive were assigned a body number?
| Â | Survivors | Percentage |
|---|---|---|
| survived | Â | Â |
| 0.000000 | 809 | 62.000000 |
| 1.000000 | 500 | 38.000000 |
| Â | Gender Count | Percentage |
|---|---|---|
| sex | Â | Â |
| male | 843 | 64.000000 |
| female | 466 | 36.000000 |
| Â | age |
|---|---|
| min | 0.166700 |
| mean | 29.503183 |
| median | 28.000000 |
| max | 80.000000 |
| Â | Class Count | Percentage |
|---|---|---|
| pclass | Â | Â |
| 3.000000 | 709 | 54.000000 |
| 1.000000 | 323 | 25.000000 |
| 2.000000 | 277 | 21.000000 |
| Â | Family Count | Percentage |
|---|---|---|
| with family | Â | Â |
| 0.000000 | 790 | 60.000000 |
| 1.000000 | 519 | 40.000000 |
| Â | fare |
|---|---|
| min | 3.170800 |
| mean | 33.718995 |
| median | 14.500000 |
| max | 512.329200 |
| Â | People In the boat |
|---|---|
| boat | Â |
| 13 | 42 |
| C | 40 |
| 15 | 38 |
| 14 | 33 |
| 4 | 31 |
| 5 | 30 |
| 10 | 29 |
| 3 | 26 |
| 9 | 25 |
| 11 | 25 |
| 8 | 24 |
| 16 | 23 |
| 7 | 23 |
| D | 20 |
| 6 | 20 |
| 12 | 19 |
| 2 | 13 |
| A | 11 |
| B | 9 |
| 1 | 5 |
People In the boat 486 dtype: int64
| Â | Total Not Survived | Not Survived With Body Number | Percentage |
|---|---|---|---|
| 0 | 809 | 121 | 14.956737 |
Summary of the analysis of individual variables¶
After an initial analysis of the variables, we learn that:
- 500 passengers survived the disaster, which represents only 38% of all passengers.
- Among the passengers, there were 843 men and 466 women, accounting for 64% and 36%, respectively.
- The youngest passenger was only two months old, the oldest was 80 years old, the average age was nearly 30 years, and the median age was 28 years.
- 323 (25%) passengers traveled in 1st class, 272 (21%) in 2nd class, and as many as 709 (54%) in 3rd class.
- 790 (60%) passengers traveled without family members, while 519 (40%) traveled with family.
- The cheapest ticket among the passengers was free*, the cheapest non-free ticket cost 3.14 GBP, the average ticket price was 33.28 GBP, the median was 14.45 GBP, and the most expensive ticket was 512.32 GBP.
- There were 20 lifeboats on the Titanic, which collectively accommodated 486 survivors.
- Only 121 out of 809 passengers who did not survive the disaster were assigned a body number, indicating that only 14% of the bodies of all passengers who perished were recovered.
*This is neither an error nor a missing value - there were passengers on the Titanic with free tickets (funded by the carrier). We will discuss free tickets on the Titanic during the analysis of the outliers.
5. Analysis of Relationships Between Variables.¶
At this stage, we will focus on examining the relationships between variables. Initially, we will investigate the survival chances of passengers. We will try to determine which variables played the most significant role in the fight for survival. Do survival chances depend on age, gender, wealth, or whether someone traveled alone or with family? We will also discuss other interesting topics. Below is a list of questions I intend to answer:
- How many men and women were there in relation to how many men and women survived the disaster?
- How many men and women were there in relation to how many men and women survived the disaster, broken down by class?
- How many children (<18) were there in each class - how many children survived in each class?
- How many elderly people (60+) were there in each class - how many 60+ people survived in each class?
- Did passengers who paid more for their tickets have a higher chance of survival, considering the class breakdown?
- Did passengers traveling with family have a higher chance of survival than those traveling alone, considering the class breakdown, and is there a correlation for men who have the lowest survival rate?
- Distribution of people in lifeboats, considering the class breakdown. Were first-class passengers privileged - did they have their own lifeboats and more space?
| Â | Total | Survived | Total_Percentage | Survived_Percentage |
|---|---|---|---|---|
| sex | Â | Â | Â | Â |
| female | 466 | 339 | 35.599694 | 67.800000 |
| male | 843 | 161 | 64.400306 | 32.200000 |
| Count Type | Total | Survived | Survival Percentage | |||
|---|---|---|---|---|---|---|
| Sex | female | male | female | male | female | male |
| pclass | Â | Â | Â | Â | Â | Â |
| 1.000000 | 144 | 179 | 139 | 61 | 96.527778 | 34.078212 |
| 2.000000 | 106 | 171 | 94 | 25 | 88.679245 | 14.619883 |
| 3.000000 | 216 | 493 | 106 | 75 | 49.074074 | 15.212982 |
| Â | Total Children | Survived | Survived (%) |
|---|---|---|---|
| pclass | Â | Â | Â |
| 1.000000 | 15 | 13 | 86.666667 |
| 2.000000 | 33 | 29 | 87.878788 |
| 3.000000 | 106 | 39 | 36.792453 |
| Â | Total Elders | Survived | Survived (%) |
|---|---|---|---|
| pclass | Â | Â | Â |
| 1.000000 | 64 | 34 | 53.125000 |
| 2.000000 | 20 | 3 | 15.000000 |
| 3.000000 | 11 | 1 | 9.090909 |
| Â | Class | Top 20% Survived | Top 20% Total | Top 20% Survival Rate (%) | Bottom 20% Survived | Bottom 20% Total | Bottom 20% Survival Rate (%) |
|---|---|---|---|---|---|---|---|
| 0 | 1.000000 | 45.000000 | 65 | 69.230769 | 31.000000 | 69 | 44.927536 |
| 1 | 2.000000 | 36.000000 | 60 | 60.000000 | 15.000000 | 56 | 26.785714 |
| 2 | 3.000000 | 38.000000 | 147 | 25.850340 | 50.000000 | 196 | 25.510204 |
| Â | Class | Survived_Alone | Survived_With_Family | Total_Alone | Total_With_Family | Survival_Rate_Alone | Survival_Rate_With_Family | Male_Survived_Alone | Male_Survived_With_Family | Male_Total_Alone | Male_Total_With_Family | Male_Survival_Rate_Alone | Male_Survival_Rate_With_Family |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 1.000000 | 82.000000 | 118.000000 | 160.000000 | 163.000000 | 51.250000 | 72.392638 | 32.000000 | 29.000000 | 108.000000 | 71.000000 | 29.629630 | 40.845070 |
| 1 | 2.000000 | 48.000000 | 71.000000 | 158.000000 | 119.000000 | 30.379747 | 59.663866 | 12.000000 | 13.000000 | 116.000000 | 55.000000 | 10.344828 | 23.636364 |
| 2 | 3.000000 | 109.000000 | 72.000000 | 472.000000 | 237.000000 | 23.093220 | 30.379747 | 53.000000 | 22.000000 | 372.000000 | 121.000000 | 14.247312 | 18.181818 |
| pclass | 1.000000 | 2.000000 | 3.000000 |
|---|---|---|---|
| boat | Â | Â | Â |
| 1 | 5 | 0 | 0 |
| 10 | 8 | 15 | 6 |
| 11 | 6 | 14 | 5 |
| 12 | 0 | 17 | 2 |
| 13 | 1 | 12 | 29 |
| 14 | 5 | 23 | 5 |
| 15 | 1 | 1 | 36 |
| 16 | 0 | 3 | 20 |
| 2 | 7 | 0 | 6 |
| 3 | 26 | 0 | 0 |
| 4 | 24 | 7 | 0 |
| 5 | 30 | 0 | 0 |
| 6 | 19 | 0 | 1 |
| 7 | 22 | 1 | 0 |
| 8 | 24 | 0 | 0 |
| 9 | 6 | 16 | 3 |
| A | 3 | 0 | 8 |
| B | 3 | 1 | 5 |
| C | 2 | 0 | 38 |
| D | 9 | 2 | 9 |
Summary of the Analysis of Relationships Between Variables¶
- There were 466 women and 843 men on board. 67% (339) of women survived the disaster, compared to only 32% (161) of men.
- In the first class, 96% of women and 34% of men survived. In the second class, 88% of women and only 14% of men survived. In the third class, only 49% of women and 15% of men survived.
- In the first class, 86% of children under 18 survived. In the second class, 87% survived, and in the third class, only 36% survived.
- In the first class, 53% of passengers aged 50+ survived. In the second class, 15% survived, and in the third class, 9% survived.
- Passengers in the 1st and 2nd class who paid more for their tickets had a better chance of survival. The ticket price in the 3rd class did not affect survival rates.
- Traveling with family had a positive impact on survival rates. This trend is visible among 1st and 2nd class passengers, likely because wealthier women, who had the highest survival rates, typically did not travel alone at that time. I specifically highlighted men to see if those traveling with and caring for their families had a better chance of survival than those traveling alone. According to my chart above, it seems they did not. However, ChatGPT, citing articles on the disaster, claims they definitely did.
- Officially, there was no class division during evacuation. Unofficially, first-class passengers were highly privileged compared to others. This was due to their cabins being closest to the lifeboats and the crew assisting first-class passengers first. First-class passengers were also the first to be informed by the crew about the severity of the situation. This information was crucial, as most passengers realized too late that the unsinkable Titanic would indeed sink. The chart above shows that boats 3-8 were almost exclusively filled with first-class passengers. Our DataFrame does not contain crew data, but we know from other sources that lifeboats were not full. This was likely to ensure more comfort for first-class passengers. The chaos and panic during evacuation and lack of proper crew training were contributing factors.
Source: ChatGPT, when asked for sources, cites the books: "A Night to Remember" by Walter Lord, "The Loss of the S.S. Titanic" by Lawrence Beesley, "Titanic: A Voyage of Discovery" by Tror Rowe. It's interesting to consider whether it actually has these books in its datasets or just claims to for appearance.
6. Analysis of Outliers¶
At this stage, we will examine values and information that significantly differ from the rest of the data. We will answer the following questions:
- How many small children under 2 (outliers) were there, and what was their survival rate?
- Who were the individuals with the most expensive tickets, and what was included in the ticket price that made it almost 100 times more expensive than the cheapest?
- How many people had free tickets, why, and who were they?
- Why were there only 5 passengers in lifeboat number one, and who were they?
- Were there individuals who made it to a lifeboat but did not survive?
- Were there individuals who did not make it to any lifeboat but somehow survived?
| Â | Total Children | Survived | Survived (%) |
|---|---|---|---|
| pclass | Â | Â | Â |
| 1.000000 | 1 | 1 | 100.000000 |
| 2.000000 | 7 | 7 | 100.000000 |
| 3.000000 | 14 | 9 | 64.285714 |
| Â | Â | Total Rich | Survived | Survived (%) |
|---|---|---|---|---|
| pclass | sex | Â | Â | Â |
| 1.000000 | female | 53 | 50 | 94.339623 |
| male | 31 | 10 | 32.258065 |
| Â | name | survived | sex | age | pclass | with family | fare | boat | body |
|---|---|---|---|---|---|---|---|---|---|
| 49 | Cardeza, Mr. Thomas Drake Martinez | 1.000000 | male | 36.000000 | 1.000000 | 1.000000 | 512.329200 | 3 | nan |
| 50 | Cardeza, Mrs. James Warburton Martinez (Charlotte Wardle Drake) | 1.000000 | female | 58.000000 | 1.000000 | 1.000000 | 512.329200 | 3 | nan |
| 183 | Lesurer, Mr. Gustave J | 1.000000 | male | 35.000000 | 1.000000 | 0.000000 | 512.329200 | 3 | nan |
| 302 | Ward, Miss. Anna | 1.000000 | female | 35.000000 | 1.000000 | 0.000000 | 512.329200 | 3 | nan |
| Â | name | survived | sex | age | pclass | with family | fare | boat | body |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Andrews, Mr. Thomas Jr | 0.000000 | male | 39.000000 | 1.000000 | 0.000000 | 0.000000 | nan | nan |
| 2 | Chisholm, Mr. Roderick Robert Crispin | 0.000000 | male | 28.000000 | 1.000000 | 0.000000 | 0.000000 | nan | nan |
| 3 | Fry, Mr. Richard | 0.000000 | male | 28.000000 | 1.000000 | 0.000000 | 0.000000 | nan | nan |
| 4 | Harrison, Mr. William | 0.000000 | male | 40.000000 | 1.000000 | 0.000000 | 0.000000 | nan | 110.000000 |
| 5 | Ismay, Mr. Joseph Bruce | 1.000000 | male | 49.000000 | 1.000000 | 0.000000 | 0.000000 | C | nan |
| 6 | Parr, Mr. William Henry Marsh | 0.000000 | male | 28.000000 | 1.000000 | 0.000000 | 0.000000 | nan | nan |
| 7 | Reuchlin, Jonkheer. John George | 0.000000 | male | 38.000000 | 1.000000 | 0.000000 | 0.000000 | nan | nan |
| 8 | Campbell, Mr. William | 0.000000 | male | 28.000000 | 2.000000 | 0.000000 | 0.000000 | nan | nan |
| 9 | Cunningham, Mr. Alfred Fleming | 0.000000 | male | 28.000000 | 2.000000 | 0.000000 | 0.000000 | nan | nan |
| 10 | Frost, Mr. Anthony Wood "Archie" | 0.000000 | male | 28.000000 | 2.000000 | 0.000000 | 0.000000 | nan | nan |
| 11 | Knight, Mr. Robert J | 0.000000 | male | 28.000000 | 2.000000 | 0.000000 | 0.000000 | nan | nan |
| 12 | Parkes, Mr. Francis "Frank" | 0.000000 | male | 28.000000 | 2.000000 | 0.000000 | 0.000000 | nan | nan |
| 13 | Watson, Mr. Ennis Hastings | 0.000000 | male | 28.000000 | 2.000000 | 0.000000 | 0.000000 | nan | nan |
| 14 | Johnson, Mr. Alfred | 0.000000 | male | 49.000000 | 3.000000 | 0.000000 | 0.000000 | nan | nan |
| 15 | Johnson, Mr. William Cahoone Jr | 0.000000 | male | 19.000000 | 3.000000 | 0.000000 | 0.000000 | nan | nan |
| 16 | Leonard, Mr. Lionel | 0.000000 | male | 36.000000 | 3.000000 | 0.000000 | 0.000000 | nan | nan |
| 17 | Tornquist, Mr. William Henry | 1.000000 | male | 25.000000 | 3.000000 | 0.000000 | 0.000000 | 15 | nan |
| Â | name | survived | sex | age | pclass | with family | fare | boat | body |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Duff Gordon, Lady. (Lucille Christiana Sutherland) ("Mrs Morgan") | 1.000000 | female | 48.000000 | 1.000000 | 1.000000 | 39.600000 | 1 | nan |
| 2 | Duff Gordon, Sir. Cosmo Edmund ("Mr Morgan") | 1.000000 | male | 49.000000 | 1.000000 | 1.000000 | 56.929200 | 1 | nan |
| 3 | Francatelli, Miss. Laura Mabel | 1.000000 | female | 30.000000 | 1.000000 | 0.000000 | 56.929200 | 1 | nan |
| 4 | Salomon, Mr. Abraham L | 1.000000 | male | 28.000000 | 1.000000 | 0.000000 | 26.000000 | 1 | nan |
| 5 | Stengel, Mr. Charles Emil Henry | 1.000000 | male | 54.000000 | 1.000000 | 1.000000 | 55.441700 | 1 | nan |
| Â | name | survived | sex | age | pclass | with family | fare | boat | body |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Beattie, Mr. Thomson | 0.000000 | male | 36.000000 | 1.000000 | 0.000000 | 75.241700 | A | nan |
| 2 | Hoyt, Mr. William Fisher | 0.000000 | male | 28.000000 | 1.000000 | 0.000000 | 30.695800 | 14 | nan |
| 3 | Renouf, Mr. Peter Henry | 0.000000 | male | 34.000000 | 2.000000 | 1.000000 | 21.000000 | 12 | nan |
| 4 | Backstrom, Mr. Karl Alfred | 0.000000 | male | 32.000000 | 3.000000 | 1.000000 | 15.850000 | D | nan |
| 5 | Harmer, Mr. Abraham (David Lishin) | 0.000000 | male | 25.000000 | 3.000000 | 0.000000 | 7.250000 | B | nan |
| 6 | Keefe, Mr. Arthur | 0.000000 | male | 28.000000 | 3.000000 | 0.000000 | 7.250000 | A | nan |
| 7 | Lindell, Mr. Edvard Bengtsson | 0.000000 | male | 36.000000 | 3.000000 | 1.000000 | 15.550000 | A | nan |
| 8 | Lindell, Mrs. Edvard Bengtsson (Elin Gerda Persson) | 0.000000 | female | 30.000000 | 3.000000 | 1.000000 | 15.550000 | A | nan |
| 9 | Yasbeck, Mr. Antoni | 0.000000 | male | 27.000000 | 3.000000 | 1.000000 | 14.454200 | C | nan |
| Â | name | survived | sex | age | pclass | with family | fare | boat | body |
|---|---|---|---|---|---|---|---|---|---|
| 1 | Lurette, Miss. Elise | 1.000000 | female | 58.000000 | 1.000000 | 0.000000 | 146.520800 | nan | nan |
| 2 | Bystrom, Mrs. (Karolina) | 1.000000 | female | 42.000000 | 2.000000 | 0.000000 | 13.000000 | nan | nan |
| 3 | Doling, Miss. Elsie | 1.000000 | female | 18.000000 | 2.000000 | 1.000000 | 23.000000 | nan | nan |
| 4 | Doling, Mrs. John T (Ada Julia Bone) | 1.000000 | female | 34.000000 | 2.000000 | 1.000000 | 23.000000 | nan | nan |
| 5 | Ilett, Miss. Bertha | 1.000000 | female | 17.000000 | 2.000000 | 0.000000 | 10.500000 | nan | nan |
| 6 | Louch, Mrs. Charles Alexander (Alice Adelaide Slow) | 1.000000 | female | 42.000000 | 2.000000 | 1.000000 | 26.000000 | nan | nan |
| 7 | Nasser, Mrs. Nicholas (Adele Achem) | 1.000000 | female | 14.000000 | 2.000000 | 1.000000 | 30.070800 | nan | nan |
| 8 | Renouf, Mrs. Peter Henry (Lillian Jefferys) | 1.000000 | female | 30.000000 | 2.000000 | 1.000000 | 21.000000 | nan | nan |
| 9 | Trout, Mrs. William H (Jessie L) | 1.000000 | female | 28.000000 | 2.000000 | 0.000000 | 12.650000 | nan | nan |
| 10 | Backstrom, Mrs. Karl Alfred (Maria Mathilda Gustafsson) | 1.000000 | female | 33.000000 | 3.000000 | 1.000000 | 15.850000 | nan | nan |
| 11 | Drapkin, Miss. Jennie | 1.000000 | female | 23.000000 | 3.000000 | 0.000000 | 8.050000 | nan | nan |
| 12 | Heikkinen, Miss. Laina | 1.000000 | female | 26.000000 | 3.000000 | 0.000000 | 7.925000 | nan | nan |
| 13 | Honkanen, Miss. Eliina | 1.000000 | female | 27.000000 | 3.000000 | 0.000000 | 7.925000 | nan | nan |
| 14 | Kennedy, Mr. John | 1.000000 | male | 28.000000 | 3.000000 | 0.000000 | 7.750000 | nan | nan |
| 15 | McCormack, Mr. Thomas Joseph | 1.000000 | male | 28.000000 | 3.000000 | 0.000000 | 7.750000 | nan | nan |
| 16 | McGowan, Miss. Anna "Annie" | 1.000000 | female | 15.000000 | 3.000000 | 0.000000 | 8.029200 | nan | nan |
| 17 | Moussa, Mrs. (Mantoura Boulos) | 1.000000 | female | 28.000000 | 3.000000 | 0.000000 | 7.229200 | nan | nan |
| 18 | O'Brien, Mrs. Thomas (Johanna "Hannah" Godfrey) | 1.000000 | female | 28.000000 | 3.000000 | 1.000000 | 15.500000 | nan | nan |
| 19 | O'Dwyer, Miss. Ellen "Nellie" | 1.000000 | female | 28.000000 | 3.000000 | 0.000000 | 7.879200 | nan | nan |
| 20 | Osman, Mrs. Mara | 1.000000 | female | 31.000000 | 3.000000 | 0.000000 | 8.683300 | nan | nan |
| 21 | Shine, Miss. Ellen Natalia | 1.000000 | female | 28.000000 | 3.000000 | 0.000000 | 7.779200 | nan | nan |
| 22 | Wilkes, Mrs. James (Ellen Needs) | 1.000000 | female | 47.000000 | 3.000000 | 1.000000 | 7.000000 | nan | nan |
| 23 | Yasbeck, Mrs. Antoni (Selini Alexander) | 1.000000 | female | 15.000000 | 3.000000 | 1.000000 | 14.454200 | nan | nan |
Summary of Outlier Analysis¶
- There were a total of 22 small children under the age of 2 on the Titanic. Five of them did not survive. They traveled in 3rd class. Being in a privileged group was not enough to survive in 3rd class. The 3rd class had difficulty accessing lifeboats because a large part of the ship was inaccessible to them and was barred by gates.
- The most expensive ticket on the Titanic cost 512 GBP. This was 36 times the median price of all tickets. The chart above illustrates the vast gap between the wealthiest and the average passengers. It was an astronomical sum for those times. For comparison, a house in England could be bought for 250 GBP. The price of the most expensive ticket included a "Parlor Suite" apartment with two bedrooms and a private patio. According to our list, only four people could afford such luxury. Additionally, we know that the rest of the 'Parlor Suite' apartments were not for sale but were reserved for the line's owners and VIP guests for promotional purposes. As it turned out, room service, electric blankets and pillows, and access to the gym were not as significant an advantage as the fact that these apartments were on the same deck as the lifeboats.
- According to the dataset, 17 people traveled on the Titanic for free. This is not a data error. Some passengers indeed did not pay for their tickets. In the first class, this included the owner of the White Star Lines - Mr. Joseph Bruce Ismay, and his colleagues and close friends (they occupied the remaining apartments described in the previous point). In the 2nd and 3rd classes, these were mainly contractors and employees of the line who were not part of the permanent crew - including members of the famous orchestra that played until the end.
- Lifeboat number one carried only five passengers. This is not a data error. First-class passengers, the Duff Gordons, and three of their friends escaped the Titanic by organizing a private lifeboat. Sir Cosmo Duff Gordon was even charged with bribing the crew and refusing to help others but was later acquitted of the charges.
- There are nine people who have a lifeboat number but did not survive. This is not a data error. Sources tell us that the boat marked A partially took on water, and passengers sat in it knee-deep in icy water. As for the remaining five people, we are not sure. It can be assumed that hypothermia played a significant role.
- There are 23 people who are not assigned to any boat but survived nonetheless. However, I suspect a data error here. Most of these people are young women from the 2nd and 3rd classes. I have not found any information that could confirm or deny this, but I suspect an error - such as not providing a boat number.
Source: When asked about the source, ChatGPT cites books: "A Night to Remember" by Walter Lord, "The Loss of the S.S. Titanic" by Lawrence Beesley, "Titanic: A Voyage of Discovery" by Trev Rowe. It's interesting whether it actually has these books in its datasets or just claims so to 'look better.'
Observations and Final Conclusions¶
Finally, I added two charts that clearly illustrate the survival correlations. Yellow dots represent rescued individuals. Large dots indicate first class, medium dots indicate second class, and the smallest dots indicate third class.
What do the numbers tell us?¶
If we were to identify a single variable that had the greatest impact on survival, it was undoubtedly gender. The crew followed the principle of women and children first. This rule was imposed by both maritime law and custom. This is clearly reflected in the data. The second most important factor was class and ticket price - wealthier passengers were more privileged. The third factor was age, meaning that the youngest passengers had the highest chances of survival. The fourth factor was whether someone traveled with family or alone. However, this variable played a lesser role for the poorest passengers.
The data on class and ticket price perfectly reflect the social stratification of that time. The majority of victims were men traveling in third class. In this class, fatalities were even seen among the youngest passengers. The poorest passengers were isolated from the upper decks of the massive ship. They were also isolated from information. A large portion of passengers simply did not know they were in serious trouble. Even if there was space for them in the lifeboats, they either couldn't reach them or didn't know they should.
Could more people have been saved? The data suggests yes. There was still plenty of room in the lifeboats. However, this would not have significantly changed the scale of the tragedy. There were simply too few lifeboats. I believe we should not judge the decisions made during the evacuation. None of us knows how we would have acted. We should judge the decisions made "calmly" on an ordinary day in the comfort of one's office. These very decisions led to so many deaths. Evidence of this is the fact that after the disaster, a number of legal regulations regarding the number of lifeboats and navigation and speed in difficult sea conditions were changed.
The biggest problem with the Titanic was not the lack of crew training, the chaotic and inept evacuation, the lack of lifeboats, or excessive speed. The biggest problem was that everyone - absolutely everyone - from the poorest to the wealthiest succumbed to the illusion that the Titanic was unsinkable. This illusion led to a series of poor decisions before the disaster and to panic and chaos during the disaster. The belief in one's own infallibility, arrogance, and blind faith in technology - these were the main causes of this epic tragedy. I wonder if things are different today.
[NbConvertApp] Converting notebook titanic_en.ipynb to html [NbConvertApp] WARNING | Alternative text is missing on 5 image(s). [NbConvertApp] Writing 8043301 bytes to titanic_en_as_report.html